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FOREWORD 


This report was prepared by Advanced Technology, Inc. (ATI) 
under contract NAS8-37812 (WBS 1.3. 4. 2) funding to assess 

quantitative methods and measures for monitoring Launch Commit 
Criteria (LCC) performance measurement trends. The specific 
Technical Directive (TD-002-342-444-0000) directed that a 
statistical performance trending analysis pilot study, utilizing 
four selected SRB, SRM, ET, and SSME Shuttle measurement types from 
the five missions prior to 51-L, be processed and compared to 
STS-26 mission data. 


To accommodate this technical guidance, under the statistical 
uncertainties of small sample theory, representative LCC 
measurement types were selected by Marshall Space Flight Center 
(MSFC) personnel from the Reliability and Maintainability (R&M) 
Office, with contractor support from Boeing Aerospace Operations 
(BAO) , Calspan Corporation, andlATI. The guideline criteria 
utilized in this selection process placed high priority on the 
statistical stability of the candidate measurements, in opposition 
to measurement criticality, to insure that the selected 
measurements were representative of the complete measurement 
spectrum. As a result, the following measurement types were 
selected: 


1) Solid Rocket Booster (SRB) Auxiliary Power Unit (APU) 
turbine speed (rpm) 

2) External Tank (ET) Liquid Hydrogen (LH 2 ) ullage 
pressure (psi) 

3) Space Shuttle Main Engine (SSME) Low Pressure Fuel 
Turbopump (LPFTP) discharge temperature (°R) 

4) Range Safety Switch (RSS) safe and arm device (event) . 


These measurement types comprise three analog and one discrete 
class of measurements which range from highly volatile oscillations 
to a simple on-off signal. 

Once the raw data coordinates were obtained, each set of 
measurements within a measurement type was processed to obtain 
statistical confidence bounds and mean data profiles for each of 
the selected measurement types. STS-26 measurements were compared 
to the statistical data base profiles to verify the statistical 
capability of assessing: 

1) occurrence of data trend anomalies and 

2) abnormal time-varying operational conditions associated 
with data amplitude and phase shifts. 
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1.0 INTRODUCTION 


1.1 OBJECTIVE 

To accommodate the tasking objectives of the Phase I Launch 
Commit Criteria (LCC) Trending Analysis (LCCTA) effort, an LCC 
performance trending plan was written which provided a sequential 
overview of the major steps involved in this prototype development. 
In summary, the development objectives of this proof-of -concept 
were oriented toward the: 

• Methodology to be used for obtaining the required data 
to support development of suitable statistical confidence 
bounds and mean data profiles of four selected Shuttle 
LCC measurement types from the five missions prior to 
51-L. 

• Approach to be used in converting trending requirements 
and issues into relevant operational algorithms suitable 
for LCC trending evaluations. 

1.2 SCOPE 

To help direct the initial focus of the prototype effort, 
current trending techniques, requirements and issues were reviewed 
to identify the critical areas of concern. This review indicated 
that the primary focus of this effort should be directed toward the 
analytical formulation of an efficient "distribution-free” 
probability density function (pdf) and associated unbiased mean 
estimator for characterizing the time-varying statistical data base 
profiles for each selected measurement type. 

This primary analytical focus resulted from the acknowledged 
realization that time-varying statistical data variations contained 
in historical successful mission measurements must be characterized 
by pdf's that are capable of representing the extreme variations 
of such data. The problem of selecting a particular form or type 
of pdf which best fits an empirical data sample has long been an 
open question in descriptive statistics. 

In theory, this same problem carries over to the Gaussian or 
Laplacian normal probability distribution. That is, no physical 
distribution can rigorously be a normal universe, as this 
distribution extends to infinity in both directions. 

A similar statement applies to any universe whose distribution 
extends to infinity in either direction. For this reason, it is 
often convenient to approximate the distribution from a random data 
sample of the parent population. Nevertheless, the a priori 
requirement of specifying a distribution (form) remains an 
unanswered question. As a result, a highly intensive literature 
search and mathematical investigation was undertaken in the Phase 
I effort to identify a satisfactory probability distribution form. 
The final selection was a special case of the Grara-Charlier Series 
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(GCS) approximation which utilizes the standardized normal 
distribution as a generating function. 

This basic series is based on the restrictive premise that the 
parent distribution has the following properties: 

• Moments of all order exist. 

• Derivatives of any required order exist with appropriate 
continuity. 

• There exists high order contact at the extremities of the 
distribution. (See Appendix B, Section B.2.) 

Given these properties and the added restriction that the parent 
population is defined on an infinite interval, an approximating 
distribution can be generated by the sum of a system of independent 
skewed frequency distributions of the Gram-Charlier type. In 
principle, this series expresses the required parent population in 
terms of the derivatives of the standardized normal distribution 
with zero mean and unit variance. It also allows for the recursive 
series coefficient formulation while maintaining explicit 
asymptotic conformity to the central limit theorem. 

The notion of extreme measurement variation is also amplified 
in the development of the statistical data base mean profiles. The 
critical concern is that historical measurement data often reflect 
displacements which could be related to design changes or time 
sequence shifts in the operational aspects of the elements being 
monitored. In statistical terms, these variations appear as 
outliers and should not be given full recognition unless these 
effects are persistent from mission to mission. To accommodate 
this relative statistical mean stability problem, an unbiased mean 
estimator was formulated which de-emphasizes the relative time- 
varying worth of rapidly changing historical measurement 
observations . 

2.0 ANALYSIS APPROACH 

2 . 1 CONCEPT OVERVIEW 

To achieve a computationally efficient proof-of-concept 
assessment of the quantitative methods for monitoring LCC 
performance, a statistical confidence bound approach was 
conceptualized which comprises both real-time and preprocessing 
components. The conceptual overview of these processing components 
is illustrated in Figure 2.1-1. 

2.1.1 Data Segmentation 

From the preprocessing component perspective, the historical 
data coordinates, (t, x,) , of each measurement within a specified 
LCC measurement type were assigned to one of the following data 
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Figure 2.1-1 Conceptual Overview 








segment types: 

1) mandatory segment 

2 ) constant segment 

3 ) dropout segment 

4) spontaneous active segment. 

Under this segmentation criteria, a mandatory data segment is 
defined as a critical measurement time interval in which safe 
operational amplitude variation limits (redlines) are identified. 

By comparison, a constant data segment is defined as a time 
interval which has constant amplitude over its entire domain of 
definition. In juxtaposition, a data dropout segment is defined 
as a time-interval in which the data coordinates have been lost 
because of data dropouts resulting from such causes as 
telecommunications, computer transfer, etc. Finally, a spontaneous 
active data segment is any other time interval not meeting the 
above criteria. 

Having segmented the appropriate measurements of a given type 
from previous successful missions, these individual data segment 
intervals can be fit by a proprietary data sampling and smoothing 
algorithm. The underlying structure of this innovative algorithm 
is designed to characterize oscillatory periodic signals defined 
by discrete data points at near real-time data acquisition rates. 

2.1.2 Data Segment Fitting Algorithm 

To provide this capability the governing periodicity equation, 

F (W) = Cos[mWh] - C, Cos[(m-l)Wh] - C 2 Cos[ (m-2)Wh]~. . .- 
C m _ 2 Cos[2Wh] - C m _, Cos[Wh] - C m /2 =0 

for each measurement segment is solved for the constituent 
(harmonic) signal frequencies, W Jf for j=l, 2, . . . , m. Here, h 

is the equidistant data spacing of the respective measurement 
coordinates and m denotes the number of distinct periods embedded 
in the imposed data coordinates. The C,, C 2 , ..., C m 

coefficients of this governing periodicity equation are 
determined such that the linear combination of functional terms 

f(t) = Ao + S [A, Cos(W.t) + B, Sin(W,t) ] 

j=i J J 


provides the necessary accuracy of the required empirical data fit. 

To accelerate the processing requirements of this imposed 
solution for large data sets, an orthogonal vector projection 
scheme is utilized with recursive back substitution to compute 
the Aj and Bj scaling coefficients. This process allows for 
all computations completed in one step to be used in the next. 
In addition, it provides for controlled accuracy assessment 
during the computations. If at any point the length of one of 
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the projected vectors becomes zero (or near zero), the particular 
data sample associated with this vector is linearly dependent when 
compared to the previous data samples. Thus, this inherent 
recognition property provides the capability to perceive the limits 
of sufficient data, i.e., number of data points. In application, 
this process is continued over a finite data set to remove the 
possibility of a spurious data sample. If this trend continues, 
the projective vector space basis is satisfied and the remaining 
coefficients become zero, which provides a critical error deflation 
feature. 


2.1.3 Data Preprocessing Component 

The next preprocessing component phase statistically 
determines the unbiased segment mean profile. The formulation 
associated with this statistical determination utilizes an unbiased 
estimator of minimum variance to de-emphasize the relative time- 
varying worth of rapidly changing historical measurement 
observations. A detailed explanation of this formulation is 
provided in Section 2.2 for those interested in the mathematical 
details of this unbiased estimator. 

2.1.4 Data Confidence Bound Determination 

Having determined the unbiased segment mean profiles, a 
confidence level, a, is specified to reflect the percentage of 
measurements within the defined limits of the statistical 
confidence bound profiles. This evaluation utilizes the 
•'distribution-free" GCS approximation to characterize the sampled 
time-varying parent (measurement) pdf. From this pdf, the 
time-varying upper and lower confidence bounds are computed. 
These bounds are obtained by setting the integral of the pdf 

equal to (1-^) and a/2, respectively, and then solving for 
the associated upper and lower limits of integration. The 
mathematical details of this formulation process are presented in 
Section 2.3. 

If this bounding process is repeated for 

tj=t t +( j — 1) At for j=l, 2, 3, ... 

equidistant coordinates, the amplitude values of the upper and 
lower confidence bounds are defined. Continuous functional 
representations of these bounding profiles (i.e., of the above 
noted form) are then stored in the historical mission data base 
for future real-time component processing. A similar continuous 
representation of the time-varying mean profile is also stored in 
the historical mission data base. 

It is this integrated data base profile storage organization that 
provides the essential interface between the real-time and 
preprocessing component environments. Figure 2.1-1 illustrates 
this elemental processing feature by the two-way arrows between the 
historical mission and trend report data bases. 
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2.1.5 Real-Time Data Processing 

In real-time data processing, the segmented candidate 
measurement to be trended is superimposed on its associated 
confidence level profiles obtained from the historical mission 
data base and any out-of-bound conditions are identified. 

2.1.6 Confidence Bound Application 

In application, this confidence level concept provides an 
additional data assessment capability. Specifically, various 
confidence levels can be specified to identify aggregated 
confidence bands or regions which characterize the relative 
time- varying measurement amplitudes from their mean measurement 
dispersion. This basic confidence bound flexibility is based on 
the individual confidence bound proposition that 100 (1-a) percent 
of the parent measurement population is expected to fall within a 
specified (1-a) confidence interval. 

2.2 UNBIASED MEAN ESTIMATOR 

The problem of determining an unbiased mean estimator in the 
presence of statistical outliers has eluded many investigators. 
For example, the iteration-of-means procedure introduced in the 
Phase I LCC performance trending plan reflects this problem. 
That is, the order in which the x/s were taken affected the final 
iteration-of-means value. In addition, the procedure outlined 
in the above referenced plan imposed the restriction that the 
measurements within a segmented time interval had to be 
statistically independent due to original plan formulation. 

2.2.1 De-Emphasized Relative Measurement Worth 

Given the above restrictive provisions, other techniques 
were explored for de-emphasizing the relative time-varying worth 
of rapidly changing historical measurement observations. A 
weighted mean of minimum variance approach was utilized to 
accommodate this de-emphasis. 

To account for time-varying measurement amplitudes 
around a particular time, t j; the influence of previous 

and later time variations were considered in this approach. 
Figure 2.2-1 illustrates the statistical methodology used to 
assimilate these time-varying measurement effects. 
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Measurement 



Figure 2.2-1 Assimilation of Time-Varying Measurement Effects 


The amplitude of a measurement is sampled at equally spaced 
intervals on either side of the time of interest, tj. The 
resulting amplitude images are then projected on the amplitude 
time line. At this point, the projected images are considered 

representative statistical samples of the referenced (i m ) measurement 
variation in the vicinity (neighborhood) of tj. 

2.2.2 Weighted Mean of Minimum Variance 

Imposition of the constraint that the average image point 
for each measurement has the same expectation, combined with the 
condition that the linear combination of averages is unbiased and 
of minimum variance, completely defines the required mean 
estimator. 


If one imposes the condition that each image point average 
is defined as: 

X k = H 2 X kJ' 


where x kJ is the j 111 image point of the k m measurement and n is the 
total number of image points used to characterize the measurement 
variation, the above noted linear combination of x^' s becomes 


k 

x = 2 a,x,, 

where x is the unbiased mean estimator and x, represents 

the mean of the i 1 * 1 measurement for k distinct measurements. 

For this expression to be unbiased, the a ( weights must be 
constrained so that 

i a, = 1. 

i=i 

That is, for x to be unbiased and for each image point average to 
have the same expectation, E[x] = E[x,], 


E[x] 


E[ 


I a,x,] 

i=i 
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= 2a,E[x t ] 


= E[x,]|a, 

- E[x] . 

Similarly, the variance of this unbiased mean estimator, x, is 


VAR [X] = VAR 


[ i a * ] 


-2 a? VAR [ x, ] 
i=l 


■ I a '°* 


where o? equals the variance of the i th measurement image point 
sample variation, i.e., 

°? = n=T |( x ir 5 i) 2 * 

To determine the values of a t which make the variance of 
x a minimum, subject to the condition of unbiasedness, note that 

k-l k 

i=i i=i 


Therefore, 


k-l 

a k = 1-2 a,. 


Thus, the variance of x can be written in terms of a,, a 2 , 


*k-t ' 


k-l 

2 a?o 2 + 

i=i 

a 2 o 2 

a k a k 

k-l 

Z afof + 

i ‘-i- 1 ■< 

k-l 

1 afoj + 

is| 

[-I'* d*')'] 

which 

make this a minimum, 


differentiates with respect to aj and equates the result to zero: 


. 2[ajaf - (1 .|' aiK] 


= 2a J o J 2 -2a k o? 
= 0. 
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a.of 

Hence, aj = for j=l, 2, k-1. However, this relationship 

is obviously true for j=k. Consequently, one can sum over j=l, 
2 , . . . , k to get 


k 

s 

J=1 


a, — a. o. 


k v k 


k <i 

2 i- 

J=i °i 


But, 


k 

£a. = 1 so that 

J=i 


a k = 


o 2 y — 

k h 


Substituting this result into the above expression, 

1 


a J = 


k i * 

s i 


1 1=1 <J| 


k i 

Upon noting that 2 is constant for all values of j , the 

isl 

associated Xj weights are inversely related to the variance of the 

j lh measurement variation about its mean. In other words, 
measurements with relatively large variances are assigned 
accompanying small weights under this unbiased mean estimator. 


2.3 CONFIDENCE BOUND FORMULATION 


This section summarizes the algebraic derivation of the 
upper and lower confidence bound formulation used in this Phase I 
effort. In theory, this upper bound derivation involves setting 

the definite pdf integral equal to 1-^- and solving for the upper 
limit. Since the pdf is defined in terms of the GCS, 

1 - f “ C [ If- 1 !" KT ] dz 

= *(z)dz + t J (“l) n * (n) (z)dz j 

which results from C 0 =l and C,=0. See Appendix A, Section A. 4, 
for an overview of the mathematical details involved. 

2.3.1 Algebraic Rational Approximation 

Utilizing Hasting's rational polynomial approximation (see 
Reference 1) to evaluate the standard normal probability, 

-z 2 

*(z)dz = -;== e 2 dz 

-® 4271 -<» 
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becomes : 


P(z u ) - P *(z)dz = 
— 00 

where 



r L_ 

l+pz u 

l€(z u )| < 7.5 X 10‘ 8 


1 - *(«u)i bjlH + € ( Z ) 

J»1 


p = 0.2316419 

b, = 0.319381530 

b 2 = -0.356563782 
b 3 = 1.781477937 

b 4 = -1.821255978 
b s * 1.330274429 


Taking this approximation in conjunction with the relations 


* (n) (z)dz = * (n_1, (z u ) and * (n ' n (z u ) - (-l) n ‘ , *(z u )H n _,(z u ) 


one obtains the following algebraic expression for the upper 
confidence bound: 


g(z u ) = *(z u ) I b n [l+^;] n - 2 *(*u> H n-|( z u>“$ = o. 

Having shown in Appendix A, Section A. 2, that the product of the 

Gaussian error curve and (n-l) lh order Hermite polynomial, 
H n .,(z u ), can be written in the recursive form 

*(z u )*Wz u ) = G m+l (z u ) 

~ m+1 [ z u^m ( ”®m-l ( z u) ] 


with initial starting conditions 

G 0 (z u ) = *(z u ) and G,(z u ) - z u G 0 (z u ) . 

It follows that the optimal upper confidence bound is a root of 
the equation: 

g(z u ) = *(z u ) £ b n [ ] "- f , |r G n .,(z u )-| = 0. 

nsl L r' u J n=2 11 * 

2.3.2 Numerical Root Evaluation 


Before considering the numerical procedure used to extract 
this root, it is interesting to note that experience has shown 
that only four to six GCS coefficients are generally required to 
provide adequate series convergence. The exception is extremely 
skewed distributions. Nevertheless, the infinite limit of the 
right-most summation can be replaced by a finite integer in 
actual application. Consequently, 
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g(z u ) 



N 

I 

n=2 


n 

n! 


’n-l 



0 


where N equals the number of coefficients (terms) utilized in 
the GCS characterization. 


To invoke computational efficiency, an iteration procedure 
originally introduced by Muller in 1956 (see Reference 2) was 
used to consecutively compute approximations to the desired 
root. This determination was obtained by substitution of 

parabolic for linear interpolation in the derivation of the 
"reguli falsi” or method of false position (see Reference 3) to 
produce the iterative equation 

2c, 

z 1+l = z , - — , for i = 2, 3, .... 

b i ± 'J b i“ 4 a i c i 


Here the sign preceding the radical is chosen to make the 
absolute value of the denominator as large as possible. This 
sign convention results in z l+1 being selected as the closest root 
to z, of the approximating parabolic interpolation 


P(Z U > = a i ( Z u -Z i ) 2 + b i( Z u“ Z i) + c i = °* 


Since this formulation requires that p(z u ) pass through the points (z,_ 2 , 
g ( Zi _ 2 ) ) , (z M , g ( z,_j ) ) and ( z, , g(z,)), the coefficients 

a,, b, , and c, at the i+1 iteration step are determined from the 
conditions: 

c, = g(z,) 

= ( Z l-2- Z |) 2 [9( Z l-l>-9( 2 l> ]-( Z l-r Z l) 2 C9( Z l-2)~g( Z |) } 

1 ~ ( Z l-2- Z l) ( Z l-f Z t ) ( Z l-2- Z l-l) 

= (z,.,-z,) [g(z,_ 2 ) -g(z,) ]-(z,_ 2 -z,) [g(z 1 _,)-g(z 1 ) ] 

a ‘ ( z i- 2 - z i) ( Z l-f Z l> ( Z l-2 _Z I-l) 


To satisfy these conditions, three approximating roots are 
needed at each iteration step. This imposes the question of 
how to select the initial three approximating guesses. Under 
the assumption that the LCC measurement distributions are 
constructively (approximately) normally distributed, at each 
selected evaluation time tj , an initial approximation for 

z, can be obtained by realizing that 



and then using a rational approximation to determine z,. 

1S ' s 2.30753 + 0.270615 S + r a 1 

* 1.0+0.992295 S + 0.04481 S 2 L z J 


That 
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where 


0<f s 0 . 5 , 



and I e [ 2 " ] I < 3 x 1 0 -3 . 


Under this course of reasoning, the initial values of z 2 and z 0 are 
defined by adding and subtracting an arbitrarily small constant 
from Zj. 


Once the a 2 , b 2 , and c 2 coefficients are determined from these 
initial starting conditions, the z 3 iterate is computed and 
the procedure is reinitialized using z,, z 2 , and z 3 in place of z 0 , z,, 
and z 2 to determine the next approximation z.. At this 
point the procedure continues until a root is obtained within a 
prescribed a priori tolerance, i.e., |z 1+1 - z,| < tol. Table 2.3-1 
outlines the underlying algorithmic steps involved in this 
procedure . 


2.3.3 Summary Remarks 

As noted in step 4 of Table 2.3-1, this procedure involves 

the evaluation of the radical *|b 2 -4ac at each iteration step. 
Consequently, the procedure has the capability to determine 
complex roots, should they exist. Furthermore, the provision 
for supporting the determination of the lower confidence bound, 
z, , is provided in this same algorithmic framework. To 
obtain z, one need only substitute 2-a for a and impose the 
constraint that z u = -z u . 

The rationale behind these conditions is reflected by the 
provision that 

? = n [ „l ( ‘ i)n £ *“ (z) ] dz - 

In summary, an analysis of the convergence properties of 
this procedure for the selected Phase I LCC measurements showed 
that, on the average, only 14 iterations were required to 
obtain the upper and lower confidence bounds within a 
prescribed accuracy of 0.00032 measurement units. By 
comparison, the extreme iteration counts ranged from 1 to 47. 
Because of the assumed normal rational approximation utilized 
in the initialization process, normally distributed 
measurements require only one iteration, while highly skewed 
measurement distributions require substantially more. 
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TABLE 2.3-1. MULLER’S ALGORITHMIC STRUCTURE 


Given the confidence level a; upper confidence bound equation, 
g(t)=0; maximum number of iterations, N 0 ; and required upper bound 
(root) tolerance, tol: 


Step 1 


Establish initial starting iterate guesses: 


S 

t. 

to 



s 2.30753+0.270615 S 

1.0+0.992295 S+0. 04481 S 2 ' 

= t,-8 0 ; (8 0 = arbitrary small constant} 

= t^Sg ; 


Step 2 


Set h, = 



d = 
i = 


®o ; 

8q ; 

[g(t,)-g(t 0 ) ]/h,; 
[g(t 2 )-g(t,)]/h 2 ; 
[8 2 -8,) ]/ [h 2 +h,] ; 

2 


Step 3 While i£N 0 , do Steps 4-8. 

Step 4 b = 6 2 +h 2 d; 

i 

D = (b 2 -4g(t 2 )d) 2 {May require complex arithmetic.} 

Step 5 If lb— Dl < Ib+Dl then set E=b+d, else set E=b-d. 

Step 6 Set h = -2g(t 2 )/E; 

p = t 2 +h. 

Step 7 If Ihktol then 

OUTPUT (p) {Procedure successful} 

STOP. 

Step 8 Set t 0 = t,; (Prepare for next iteration} 

t, = t 2 ; 
t 2 = p; 
h t — ; 

h 2 = t 2 -t,; 

8, = [g(t,)-g(t 0 ) ]/h,; 

8 2 = [g(t 2 )-g(t,)]/h 2 ; 
d = [8 2 -8 1 ]/[h 2 +h,]; 
i = i+1 

Step 9 OUTPUT - Method failed after N 0 iterations. 

{Procedure unsuccessful} 

STOP. 
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2.4 


INTEGRATED BOUND FORMULATION 


Having outlined the statistical assumptions and 
computational features of the unbiased mean estimator and 
confidence bound formulations, this discussion addresses the 
integrated implementation aspects of the integrated bound 
formulation process. To facilitate an understanding of this 
three step process, a diagrammatic overview of these steps is 
provided in Figure 2.4-1. 

The pertinent implementation aspects of the first step 
recapitulates the necessity of considering an unbiased mean 
estimator to account for statistical measurement outliers related 
to design changes or time sequence shifts in the operational 
measurement displacements. Once the mean estimator is 
established for a referenced evaluation time, t j , the 
aggregated spectrum of assimilated time-varying image points can 
be utilized to generate a histogram of the measurement variations 
relative to the referenced observation time of interest. This 
histogram of aggregated measurement variations is depicted in 
Figure 2.4-1 as the second step of the process. The theoretical 
aspects of this histogram development will be discussed in 
Section 2.4.1 along with Sheppard's correction for mid-point 
class frequencies. 

Before the above noted mid-point class frequencies can be 
used in the third step of this process to compute the GCS 
coefficients and upper and lower confidence bounds, they must be 
transformed into z-coordinate moments by the standard z 
transformation of Figure 2.4-1. The theoretical development of 
this requirement follows from the general properties of semi- 
invariants under a linear transformation. These generalized 
properties are discussed in Section 2.4.2 for those interested in 
the mathematical details involved. 

2.4.1 Histogram Development 

Histogram development for characterizing the shape of a 
distribution from discrete sample data observations suffers from 
the unavoidable restriction that such representations are likely 
to exhibit large errors, unless the number of sample observations 
is large. This is reflected by the actual construction of a 
histogram. The range of the observation data is divided into 
arbitrary bins or class intervals, and the number or proportion 
of observations falling in each interval is assigned to the mid- 
point of the interval. As a result, small data samples often 
have class intervals which do not contain any observations unless 
the class interval length is increased. At the other extreme, 
the selected class interval could be so large that all 
observations fall in a single class interval. 
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Figure 2.4-1 Representative Bound Formulation Process 



2. 4. 1.1 Application Implications 

Regarded in this light, the class interval length selected 
for each aggregated spectrum of assimilated time-varying image 
points affects the smoothness of the resulting histogram. If the 
class interval is too small, the histogram is ragged; if it is 
too large, the histogram is over-smoothed and information is 
obscured. Although important, the influence of histogram 
smoothness is somewhat minimized by the constructive formulation 
of the current application. Fundamentally, the least sguares 
minimization imposed on the form of the GCS (see Appendix A, 
Section A. 3), combined with the constructive addition of 
assimilated time-varying image points, diminishes this effect. 

In recent years, this smoothing effect has received 
increased attention. For example, Rudemo (see Reference 4) has 
developed automatic smoothing estimation methods from the 
probability density kernel, by letting W(x) be a nonnegative, 
symmetric weight function, centered at zero and integrating to 
one. Under this method, W(x) can be the standard normal density 
function so that 

w h< x >=H w (fO 

is a rescaled version of W where h is the class interval 
bandwidth of the estimating function. As h approaches zero, W h 
becomes more concentrated and peaked about zero. As h approaches 
infinity, W h becomes more spread out and flatter. Under the 
assumption that W(x) is a standard normal density, W h (x) is a 
normal density with standard deviation h. 

In general terms, if x,, x 2 , ..., x„ is a sample drawn from an 

unknown pdf, f, an estimate of f is given by 

f h (x) = h i W h (x-x,) . 

1:1 

Such a function is called a kernel probability density estimate, 
since it consists of the superposition of "hills" centered over 
the sampled observations. In the case where W(x) is the standard 
normal density, w h (x-x,) is the normal density with mean x, and standard 
deviation h. 

In general, the bandwidth parameter, h, controls the 
estimating function smoothness and corresponds to the class 
interval length of the histogram. Theoretically, an optimum 
bandwidth value can generally be established under this approach 
by defining an a priori smoothness criteria, providing the number 
of sample observations are large enough to support the imposed 
criteria. 
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This histogram smoothing methodology, among others, was 
investigated during the Phase I effort. The conclusion was that 
all such approaches provided little if any benefit over the 
straight GCS minimization property augmented by assimilated 
time-varying image point construct. In fact, certain 
combinations of circumstances provided significantly worse 
histogram estimates. 

2.4. 1.2 Constructive Histogram Development 

Under the realization that the basic serial GCS gives the 
frequency function in the form 

*(z) - I a„* (n) (z), 

ruO 

where the various coefficients are expressed as moments or 
semi -invariants , the concern over histogram development 
methodology has a secondary effect. The important consideration 
is that moments are exclusively expressed as definite integrals 
which are often difficult to determine in extremely skewed 
distributions. Moreover, unless the observations are numerous, 
it is generally impossible to compute moments of higher order 
than six. In the current application, this results from the 
large errors arising from the random sampling and measurement 
variations about the unbiased mean estimator. As a rule of 
thumb, it is generally useless to compute moments from a 
histogram of higher order than four to six when the number of 
individual observations is less than 400 to 500. This rule is 
once again confirmed by the assimilated time-varying image point 
observations; only the lower limit must be extended to 
approximately 460 for this application. 

Failure to abide by this rule results in exhibiting a some- 
what "poor fit" to the originating observation data and often 
gives rise to negative frequencies at the extreme tails of the 
associated GCS distribution. From a purely practical consider- 
ation, this last objection has little effect on the upper and 
lower confidence bounds, because the originating observations at 
the distribution extremities are very few in number. As a re- 
sult, the constructive histogram development approach implemented 
in the Phase I effort divided the observation data into class 
intervals centered over the unbiased mean estimator in such a way 
that at least every interval contained one or more observations. 

2 . 4 . 1 . 3 Adjusted Method of Moments 

In basic statistics, the k lh moment of a particular frequency is 

defined as the product of the frequency times the k th power of the 
distance about which the moment is required. To utilize this 
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definition to evaluate the moments of a relative histogram about 
a selected point, one is forced to assume that the various 
histogram frequencies are concentrated at the mid-point of their 
respective class intervals. In addition, the problem of 
determining a reference point about which to take moments that 
produces a pre-assigned first moment 


M, = 


N 

IS 

j=l 


J f J 


N 

If 

j=i ■ 


remains an open issue, since the Sj distances are unknown 

variables. However, the denominator of the above moment 
expression equals 1.0, given that the f j ' s are relative 
histogram frequencies. The solution to this problem can be 
further motivated by letting the distance between A, the point 
about which the moments are known, and B, the point about which 
they are required, be d. Then, if the distance of any histogram 

frequency, fj, from A is Xj, and from B is x jf x j =X j -d and Xj=(Xj-d) k . 

To illustrate how the above distance relationships can be 
used to evaluate d, consider the hypothetical histogram data of 
Table 2.4-2. In this illustrative data sample of 1,000 
measurements, the class interval length is 5 measurement units 
and the imposed unbiased mean estimator equals 0.480. The 
numbers -4, -3, -2, ... in column (3) are the respective 
distances from the class interval mid-point values from d, in 
terms of the unit of grouping. By forcing the condition that 
the unbiased mean estimator, x, must equal Mj, the expression 


is imposed where h is the assigned histogram class interval 
length. Therefore, 



-*[ 

from the fact that the fj's are relative frequencies, i.e., 

N 

Ifj=l. 

j=l 3 

Consequently, 

N 

d = X X,f, - hx. 
j=i 

Therefore, the adjusted class interval distance is 

d = 79.400 - (5) (0.480) = 77.000 

for this illustrative example. As a result, the totals of 
columns (4) to (7) are the respective first, second, third, and 
fourth discrete moments about x. 
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Adjusted Method of Moments 


Table 2.4-2 Adjusted Method of Moments 
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Total number of measurements 






















Recalling the assertion of Section 2.4. 1.2 that moments are 
exclusively expressed as definite integrals, a correction for 
treating histogram observations as concentrated frequencies at 
the class interval mid-points will be discussed in the remainder 
of this section. The theoretical aspects of this undertaking are 
focused on the desire to calculate the magnitude of discrete 
moment adjustments so that the true moment magnitudes can be 
assessed. 

Given that the values of the discrete moments, r^, are 
obtained from finite sums and not as definite integrals, they are 
subject to certain adjustments if one wishes to express them as 
continuous moments. The magnitudes of these adjustments can be 
computed from well-known formulas from the theory of numerical 
quadrature (see Appendix B) if the frequency function and its 
derivatives vanish for x=-» and x=+». The English 
mathematician, Sheppard, has developed the following correction 
formulas for the transition from discrete (M k ) to continuous 
(n k ) moments: 

>s, - »o 

», - M, 

IN “ m 2 - T? 

«3 = M, - T M , 

», = M < - T + 175 «o 

^5 ” M 5 “ 6 M 3 + 48 M i 


IN ’ i(S)( 2 ‘' n - 1 > B A- n h" 

n=Q 

where h equals the assigned class interval length and the B n 
coefficients are the Bernoulli numbers generated by the recursive 
form 



with starting value B 0 =l. 

These correction formulas can also be used to assess the 
effects of roundoff errors. For example, the third correction 

formula shows that a mean error of ^ in x affects M 2 only as In 

summary, Sheppard's corrections emphasize the fact that the 
method of moments works with curve areas instead of curve 
ordinates. 
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2.4.2 


Semi-Invariant Linear Transformation Properties 

The evaluation of the GCS coefficients (using the semi- 
invariant relationships outlined in Appendix A, Section A. 4) 
defined by the moments in the x-coordinate (measurement) system 
must be transformed into z coordinates by the transformation, 



This implies both a change of origin and scale. Specifically, 
the origin is transformed to z=0 and the scale is altered by the 

scaling factor where a is the parent measurement population 
standard deviation estimate. 

Denoting the operator form of the x-coordinate system semi- 
invariants by K,(x), K 2 (x), K 3 (x), ... with transformed origin 

\,(z), the general linear transformation z=ax+b becomes: 

Kj(z) = K,(ax+b) 

= a\,(x)+b 
N 2 (z) = K 2 (ax+b) 

= a 2 K 2 (x) 

Letting \,(x) = St and \ 2 (x) = o 2 , 

K, (z) = a*+b 
K 2 (z) = a 2 a 2 . 

Since the coordinate system of reference (z system) must have a 
zero mean and unit variance, Kj(z)=0 and \ 2 (z)=l so that 

a£+b = 0 
ao = 1. 

. . . l 

From this relationship a= ^ and b= -j, which corresponds to the 
above transformation form. 

Moreover, \j(z) = ^ \j(x) for all values of j greater than 2. 

Consequently, the following simple rules can be used to epitomize 
the general semi-invariant computations: 

• Set N t (x) = St the unbiased mean estimator. 

• Compute Kj(x) for all values of j equal to or greater 
than 2. The numerical values of these paramenters 

divided by a*, for j=2, 3, 4, ... are the semi- 
invariants of the histogram frequency in the z system of 
the GCS. 
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3 . 0 DEVELOPMENT FINDINGS 


3.1 DATA TRANSFER COMPLICATIONS 

To accommodate the data transfer requirements of the Phase I 
effort, the individual raw data coordinates for each measurement 
had to be identified and transferred to 1.2 megabytes, 5 1/4 inch 
diskettes. Rockwell International, Inc. was tasked to collect and 
transfer this mission data to ATI. This data included the 
referenced missions 51-1, 51-J, 61-A, 61-B, 61-C, and the STS-26 
mission measurements used for proof -of -concept trending. 

The raw data coordinates associated with these six missions 
constituted a data storage requirement of approximately 100 
megabytes of data. Given that there were 17 individual 
measurements involved with the four different measurement types per 
mission, each measurement represented an average data storage of 
980 kilobytes. 

During the verification of data readability, it was determined 
that there existed time gaps in the acquired data. This problem 
was attributed to data dropouts resulting from such sources as 
telecommunications, computer transfer, etc. To overcome this 
problem, numerous data corrections were required. To save time and 
effort, ATI was directed by Mr. Pizzano, chief of the MSFC 
Reliability and Maintainability Engineering Division, to consider 
only measurement data from T-600 seconds to T+600 seconds. As a 
result, all measurements were processed in this directed time 
interval during the data reduction phase of this effort. 

3.2 SOFTWARE PARAMETER TUNING 

During the early phase of prototype software development, it 
was recognized that individual confidence bound profiles would 
exhibit sufficiently different spatial behavior to warrant 
definition of specific processing parameters. To reconcile this 
acknowledgement, a parameter declaration option was included in 
the Parameter File Generation (PARMGEN) software module for 
individual segment processing parameter specification. In 
retrospect, the fortuitous inclusion of this option was confirmed 
and utilized for confidence bound segment profiles involving 
measurement variability. 

To illustrate the rudimentary effects of confidence bound 
sensitivity, consider the tabular spectrum (Table 3.2-1) of image 
point generation parameters used to compute 97 percent confidence 
bound profiles for the highly variable SRB APU Turbine Speed LCC 
measurements. Assessment of the experienced confidence level 
bounds associated with various parameter combinations is summarized 
in Figure 3.2-1. Here the maximum and minimum deviations from the 
stipulated confidence level are the 50 and 26 image point cases 
shown graphically in Figures 3.2-10 and 3.2-6, respectively. In 
terms of percent confidence level variation, these same parameter 
combinations exhibit confidence bound differences of only 1.52 and 
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Table 3.2-1 Effects of Confidence Bound Sensitivity 
to Image Point Parameter Specification 
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* For 20 Measurement Sample Size, Le. 4 
Per Mission Over 5 Missions. 



-0.16 percent, relative to the stipulated or baseline confidence 
level. 


It should be noted, however, that close scrutiny of the 
graphical measurements of Figures 3.2-2 through 3.2-12 reveal 
that mission 61-C has one measurement with far less data 
resolution than the others. Removal of this questionable 
measurement only affects the lower confidence bound profiles in 
a narrow band around the time periods t=-18.70 and t=-17.07 
seconds and substantially leaves the average percent of in-bound 
measurements unaltered. This provides further evidence of the 
unbiased mean estimator efficiency. 

Similar spectral results were obtained from the ET LH 2 
Ullage Pressure and SSME LPFTP Discharge Temperature 
measurements. By comparison, the RSS Safe-and-Arm Device is of 
particular interest from a proof-of-concept standpoint in that it 
represents a binary on-off signal. Statistically, this type of 
measurement can only produce variation during its switch-over 
cycle whenever the phase shift timing is different between the 
various measurements. This aspect was not analyzed in this Phase 
I effort because it would have required the specification of 
infinitesimal At distances for the image point processing 
parameters, as well as the parameter used to define the adjacent 
distance between tj confidence bound profile evaluations. 
Nevertheless, it is a surprising accomplishment to statistically 
provide such confidence of measurement samples utilized in this 
proof-of-concept effort. 
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Figure 3.2-2 2 Right And Left Side Image Points 
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Figure 3.2-3 3 Right And Left Side Image Points 
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Figure 3.2-4 5 Right And Left Side Image Points 
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Figure 3.2-5 9 Right And Left Side Image Points 








30 


Figure 3.2-6 13 Right And Left Side Image Points 
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Figure 3.2-7 17 Right And Left Side Image Points 





Figure 3.2-8 21 Right And Left Side Image Points 
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Figure 3.2-9 23 Right And Left Side Image Points 
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82.0800 



Figure 3.2-12 37 Right And Left Side Image Points 




3.3 PROTOTYPE SOFTWARE COMPOSITION 


Having demonstrated the statistical adequacy of experienced 
confidence level predictions and profile shaping parameter 
sensitivity, this section addresses the prototype software 
composition. 

As with most prototype software development, there existed 
two overlapping and conflicting concerns: the desire to maximize 
code efficiency and the necessity to achieve recognized software 
clarity through relational code structures. To overcome these 
primitive code development concerns, the concept of establishing 
a subjective code quality metric was introduced. In this context, 
the term metric is defined as a measure of the extent or degree to 
which a given program segment (subroutine) exhibits each of the 
above concerns. This subjective code development framework also 
aided the mathematical formulation task in that algorithm 
formulation refinements could be correlated to the implemented 
software efficiency and clarity. 

At this point, attention is directed toward the preprocessing 
and real-time software components introduced in the concept 
overview of Section 2.1. As indicated in this section, the 
preprocessing software component is responsible for data reduction, 
segmentation and data conditioning tasks needed to reduce the LCC 
measurements of a given type into a usable data format for 
statistical data assessment. It is this data conditioning 
mechanism that gives the overall process its efficiency in such a 
demanding computational environment. 

Figure 3.3-1 provides a summary of the module level 
interaction used in this prototype software implementation to 
structure the above noted processing requirements. 
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DATA REDUCTION 



Figure 3.3-1 Overview of Software Module Implementation 




DATA CONDITIONING 
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Figure 3.3-1 Overview of Software Module Implementation (Cont’d) 
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Figure 3.3-1 Overview of Software Module Implementation (Cont'd) 



3.4 PROOF-OF-CONCEPT TRENDING RESULTS 


During this phase of the effort, applicable confidence level 
segment profiles were generated for the four selected LCC 
measurement types from the five missions prior to 51-L. Then 
attention was directed towards determining the confidence level 
sensitivity of each segment profile. To assess this sensitivity, 
the average percent of measurements within the respective 
confidence bound profiles was determined. Graphical substantiation 
of this sensitivity was also provided by looking at the confidence 
interval distance as a function of segment time and confidence 
level uncertainty for each applicable data segment and measurement 
type. 


Once this activity was completed, the relevant LCC trending 
reports (graphic displays) associated with the applicable STS-26 
measurements were induced and analyzed for trend anomalies. 
Indicative trending results for selected STS-26 SRB APU Turbine 
Speed, ET LH 2 Ullage Pressure and SSME LPFTP Discharge 
Temperature measurements are graphically presented in Figures 
3.4-1 through 3.4-3 for 95 percent confidence level bound 
profiles. The image point parameter specification used for each 
of these figures is provided below: 

1) Delta distance from tj eguals 0.01562. 

2) Distance between adjacent tj's eguals 0.03125. 

3) Number of assimilated image points per measurement 
equals ten. 

In Figure 3.4-1, the peculiar (alternating) phase shifts of 
the selected STS-26 measurement relative to the upper and lower 
confidence bound profiles appears to reflect a shifted turbine 
speed start-up sequence on the STS-26 mission. Another related 
phase shift phenomenology is illustrated in Figure 3.4-2. In 
this instance, the upper and lower ullage pressure confidence 
profiles appear to have a shorter period than the selected STS-26 
measurement. This spatiotemporal phenomenon appears to be 
related to tank time/temperature differences between individual 
mission launch environments. Another point to be considered 
regarding the ullage pressure assessment is that only 15 
measurements were available, i.e., three measurements per 
mission, over five missions. 

In reference to the SSME LPFTP Discharge Temperature 
trending assessment of Figure 3.4-3, it should be noted that both 
Channel A and B data (i.e., with markedly different data 
resolutions) were used in this STS-26 Channel B characterization. 
By comparison, the pursuant RSS Safe-and-Arm Device trending 
assessment of Figure 3.4-4 shows no upper and lower confidence 
bound profiles because the binary on-off signal (measurements) 
lack statistical variance. See Section 3.2 comments for added 
clarification. 
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STS-26 Trended Data 
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Figure 3.4-1 STS-26 Trended Data for SRB APU Turbine Speed 
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Generated Statistical Confidence Bounds 
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Figure 3.4-4 Generated Statistical Confidence Bounds for 

RSS Safe and Arm Device 



Further illustrative results are provided in Figures 3.4-5 
through 3.4-7 which show the upper and lower confidence bounds 
superimposed on the generating measurements. It is of particular 
interest to note the in-phase measurement grouping of Figure 3.4- 
5 within the approximate time interval -19.8 to -15.5 seconds. 
Another interesting peculiarity is noted in Figure 3.4-7 where the 
Channel B measurement of Main Engine One associated with mission 
61-A appears as a statistical outlier. 

To check the confidence level sensitivity associated with the 
measurements depicted in Figures 3.4-5 through 3.4-7, equidistant 
time cuts with 0.250 spacing were constructed over the indicated 
confidence profile intervals. Then these intervals were evaluated 
and the averaged proportion of in/out measurements were used as the 
actual confidence level. These values were 96.3, 95.8, and 95.01, 
respectively, for the measurements of Figures 3.4-5 through 3.4-7. 

4.0 CONCLUSIONS AND RECOMMENDATIONS 

4 . 1 CONCLUSIONS 

It has been demonstrated that the GCS is a "distribution-free" 
pdf with explicit asymptotic conformity to the central limit 
theorem. In addition, the capability to generate highly accurate 
upper and lower confidence bound profiles from this methodolody has 
been established. Consequently, it can safely be stated that this 
proof-of-concept is substantiated. 

4 . 2 RECOMMENDATIONS 

Given that the concepts of an LCC measurement population and 
the selection of random measurement samples are basic to this 
inductive LCCTA methodology, it is recommended that the Phase I 
effort be proceeded by adding expanded LCC measurements of selected 
types from previous missions. To help reveal any underlying 
idiosyncrasies embedded in these measurements due to peculiar 
operational phenomenologies or design changes, it is further 
recommended that the aggregated bound variations from mission to 
mission be assessed to identify any mission/measurement drivers. 

In addition, it is recommended that the Phase I prototype 
software be refined and integrated into a self-contained package 
to support the above processing requirements. This activity should 
also include the preparation of full-user software and methodology 
documentation to permit potential integration of this trending 
technique into other systems and applications. 
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Generated Statistical Confidence Bounds 


ORIGINAL PAGE IS 
OF POOR QUALITY 



47 


Figure 3.4-5 Generated Statistical Confidence Bounds 

SRB APU Turbine Speed 
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Figure 3.4-6 Generated Statistical Confidence Bounds 

ET LH 2 Ullage Pressure 
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APPENDIX A 


STATISTICAL FORMULATION 

A. 1 Historical Overview 

The primary mathematical focus of the Phase I proof-of-concept 
effort was directed toward the development of a ’'distribution-free" 
characterization of the time-varying statistical confidence bounds 
associated with historical Launch Commit Criteria (LCC) measurement 
data profiles. Once developed, this capability provides the 
opportunity to assess the occurrence of potential measurement data 
anomaly trends and other abnormal operational conditions relating 
to erratic data amplitude and phase shift variations. 

Since the statistical data variation contained in historical 
successful mission measurements must be related to aggregated time- 
varying probability density functions, the attainment of these dis- 
tribution parameters are of paramount importance to the stability 
of the resulting confidence bound assessment. The capability to 
characterize these distribution parameters is set in the early sta- 
tistical investigations conducted by Gram (see Reference 5) . In 
this basic research, Gram showed that an arbitrary frequency dis- 
tribution, defined on an infinite interval, can be generated in 
terms of the sum of independent frequency distributions. It was 
also expounded in this original work that the normal symmetrical 
Gaussian error curve is but a special case of a more general system 
of skewed frequency distributions which can be represented by an 
approximating series expansion. 

It is also interesting from a historical perspective to note 
that Thiele confirmed this far-reaching idea in 1887. The sub- 
stance of this confirmation was later published in a book entitled, 
"A General Theory of Observations" (see Reference 6) . Among the 
added frequency distribution achievements of Thiele was the intro- 
duction into statistics of the notion of moments, which he gave the 
name "seminvariants. " Subsequent statistical literature usage of 
this term is "semi-invariants." By means of these semi-invariants, 
Thiele illustrated the relationship between sample data histogram 
frequency approximations and the series expansion of Grants 
arbitrary frequency distribution. 

In addition to Thiele's work, the Swedish astonomer C. V. 
Chari ier introduced an efficient computational methodology for 
expressing the various distribution parameters from Thiele's semi- 
invariants. This was accomplished by the utilization of both 
orthogonal functions and integral equations. Thus, this series 
expansion was designated as the Gram-Charlier Series (GCS) in the 
subsequent statistical literature of the time. 

With this historical background, the specious question of why 
this series expansion has not received the attention it deserves 
in the current literature is more speculative. The most probable 
reason is the fragmentary and unsystematic manner in which it was 
developed. 
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A. 2 Mathematical Background 


To place this GCS analytical development in its proper 
orientation, a remarkable functional property between Hermite 
polynomials and the repeated differentiation of the Gaussian error 
curve. 


exists. 


for k=0. 


Denoting the various derivatives of ♦(z) by 
* k (2) = (HZ)) 


1/ 2, . we obtain the following relations: 


* (0) (z) = 

^ (1) ( Z ) 

* (2) (z) = 
* (33 (z) = 

* {4 ) (Z) = 


*< 2 » 


= - z’i’(z) 



(z 2 -l)'i'(z) 
- (z 3 -3z)i , (z) 
{z^-6z 2 +3)^i {z) 


Given that these derivatives are represented as the products of 
polynomials of z and the Gaussian error curve itself, it is 
only natural to inquire about the functional polynomial forms 
involved. As it turns out, these polynomials are known as 
Hermite's polynomials from the name of the French 
mathematician, Hermite, who first introduced the recursive 
formulation, 

Hn + ,(Z) = ZHn(Z) - nHn.^Z) 

for this class of orthogonal polynomials. Upon utilizing this 
recursive formulation with starting values, 

H 0 (z) = 1 and H, ( z ) = z, 
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the successive polynomial forms: 

H 2 (z) = z 2 -l 

H 3 (z) = z 3 -3z 

H 4 (z) = z 4 -6z 2 +3 
* 

are exhibited that are contained in the various derivatives of 
the Gaussian error curve. 

From a numerical computation point-of-view, it is 
convenient to recursively use the expression 

<W Z ) = H+I t zG n( z ) " G n-i ( 2 ) 3 

together with the starting values 

G 0 (z) - * (z) and G,(z) = zG 0 (z) 
to obtain the general form 

* (0) (z) = *(z)H 0 (z) = G 0 (z) 

* (n (z) = -’i' ( z ) H, ( z ) = ~Gj ( z ) 

* (2) (z) = *(z)H 2 (z) = 2G 2 (z) 


* n (z) = (-D^zJHJz) = (-l) n n!G n (z) 
of the Gaussian error curve derivatives. 

Before addressing the generation of the GCS , it is 
necessary to discuss some important integral equation 
relationships between the Hermite polynomials and the various 
derivatives of *(z) or * l0) (z). To this end, consider 
the following series of functions: 

* (0) (z), * (1) (z), * (2) (z), * (3) (z), ... 

Hj (z), Hj ( Z ) , H 2 ( Z ) , Hj ( Z ) , ... 


* (n) (z) = (-l) n ¥(z) HJz) 
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where 



with 


Lim (* (n) (z) ) = 0 for n=0, 1, 2, .... 

Z~>± 00 

To geometrically illustrate these relations. Figure A-l 
contains the first four derivatives of * ( z ) as well as ^(z) 
itself in composite form. Clearly, *( z) and all its 
derivatives of even order are even functions of z, while all 
the derivatives of odd order are odd functions of z. 
Furthermore, *(z) is a single valued positive function with 

maximum value of at z»0 with points of inflection 

at z=±l which approach the abscissa axis asymptotically in both the 
positive and negative directions. With regard to the first 
derivative, 


* 0) (z) * -*(z)H,(z) , 

the maximum and minimum values are located in the neighborhood 
of z=— 0 . 9 and z=0.9, respectively, with similar asymptotic tail 
properties. By comparison, the second derivative possesses a 
minimum at z=0 and the third derivative has approximate first 
minima and maxima at z=-0.7 and z=0.7, respectively, while 
maintaining the required asymptotic tail property. This same 
geometric behavior is illustrated for the fourth derivative. 

In this case, a major maximum at z*0 is attained and * H) (z) crosses 
the abscissa axis from positive to negative in the neighborhood 
of z=±0 . 75 . 

As a result, these functional characteristics resemble 
the sine and cosine functions encountered in Fourier series 
harmonic analysis. This same analogy relates to the functions 

* [n] (z) and H,, ( z ) contained in the above noted 
series. That is, these functional forms characterize a 
biorthogonal system in an infinite interval. In mathematical 
terms, such biorthogonal functions satisfy the following 
conditions: 

1) All functions are real and continuous over their 
plane of definition. 

2) No function is identically zero in its plane of 
definition. 

3) Each pair of functions satisfy the integral relation: 

J* H (z)* (n) (z)dz=0 for n*m. 

To prove this self-evident relation, note that 
* (n) (z) = (-l) n *(z)H n (z) and * tm) (z) = (-l) m «’(z)H m (z) so that 

j” co H m (z)'t (n, (z)dz = (“l) n J* oe H m (z)H n (z)*( 2 )dz 
- (-l) n ' m j W H„(z)* lm) (z)dz. 

— CO 


A- 4 



GAUSSIAN ERROR CURVE WITH DERIVATIVES 



l-5 


Figure A-l. Composite Gaussian Error Curve with up to Four Derivatives 



Since these integral equations hold for all values of m, n = 0, 

1, 2, . .., it is only necessary to prove the proposition for n>m. 

By the application of integration by parts 

= H m (z)* (n - l) (z)H - /^(zj^-'^zjdz 

where the above notation, H^(z), denotes the first derivative of 
HjJz) . Clearly, the first term on the right side of the above 
equation reduces to 0, since ( z ) =0 for z =±« and because the 
order of H^z) is lower than * (n) (z) . Consequently, 

/^(zjl^zjdz = -C^( Z )^(z)dz 

= jr <x> H ^ ) (z)* (n_2 ) (z)dz 
= -rX 3) (z)^ 3) (z)dz 

by three repeated applications of partial integration. If this 
process is continued m+1 times, 

f H m (z)H [n] (z)dz = (-D-r.Hf^O^-^zJdz. 

— 00 —00 

Since H^z) is a polynomial of m m degree, its m+1 derivative 
is equal to zero, which provides the required proof that: 

/"^(zj^zjdz = 0 

for all values of m and n where n*m. 


If n=m, exactly the same procedure may be utilized, but 
the partial integration process is stopped after the n th 
integration , i . e . , 

/^(zj^fzjdz = (-l) n j“ a> H^ , (z)'i'(z)dz 

after replacing m by n and noting that * (n " n) (z) = * (0) (z) = i’(z). 
However, the n th derivative of the n th degree polynomial, H^z), has 
the constant value n! for all n=0, 1, 2, .... 

Therefore, 

/‘^(zj^’tzjdz = ( ~^l n! J^e+dz - (-l)"n! 
as a result of the standard normal distribution property 


1 

V2n 


-z 2 


f e 2 dz=l. 

* -00 
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A. 3 Weighted Square Error Coefficients 


Given the biorthogonality property between the functions 
* (n) (z) and H^z) , a generalization of the original series 
introduced by Gram will be derived which minimizes the weighted 
square error of the approximating probability density function 
(pdf) . 

To facilitate this development, it is appropriate to 
consider a special case of Gram's original series which 
utilizes the Gaussian distribution as a generating function: 

*(z) = f a n i (n! (z) . 

n-0 

To evaluate these coefficients in the least squares sense, 
consider the following minimization condition: 

t(z)-fa n i (nl (z) "1 2 

p2__ I dz = Min. 

4*UT 

That is to say, the coefficients are determined such that the 
sum of the squares of the differences between §(z) and the 
approximating distribution series is a minimum. 

To accommodate this solution, note that 



Sa n * (n) (z) = i(-l) n a n '*'(z)H n (z). 

n=0 n=0 


On the basis of this condition, 

fa„* w (z) 


G(z) = 


_ n=0 




so that 


w(2> ' Jl [ 


*(z) 


- G(z) dz 


] 


_ f" r [*(Z)] 2 „»(Z)G(Z) .2 1 

'J.. 2_ jfrfr [ ( n J 


To minimize this expression, impose the standard least squares 
condition that 


3W(z) . _ . „ , „ 

= 0 for 3 = 0 , 1, 2, ...; 


o = j- r Liiml dz - 2 — f + J- r r G (z\ i 2 dz 

0 3aj *(z) az ^ J -oo ^ 3aj J.„ [G(Z) ] dz ' 
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Clearly, the first integral on the right-hand side of this 
equation equals zero, since the integrand is not a function of 
a r Furthermore, talcing the partial derivative with respect to 
aj under the integral of the second right-hand term, 


f *(z) 3G(z) 

J J*(z) 3a 





S(-l) n a n H n (z) 

n=0 


] 


dz 


- -2j“ (-l) J *(z)H (z)dz. 

-00 J 

Similarly, the last term becomes: 


C sir c G < z » 2 dz - C 2<s(*> tst 1 dz 


= 2 


= 2 


O 

r [ 


S(-l) n a n H n (z) 


n=0 


S(-l) n a n *(z)H n (z) 


-00 
. 00 


n=0 


] [ 


= 2 f i(-l) n a n * (J) (z)H n (z)dz 
J n= o 


^ (-i)i^fTiT Hj (z) j dz 
(-1) J Hj (z) 1 dz 


= 2ap! 

from the biorthogonality property between * (m) (z) and H^z). 
Equating these integral results to zero and solving for a jf one 
obtains the following coefficient relationship: 

a, = -^n^-f 00 *(z)H.(z)dz for j=0, 1, 2, .... 

Substituting Cj = J $(z)Hj(z)dz, one obtains the standard GCS 
approximation form 

*(z) = I(-l) n ^-* (n) (z). 

n=0 n * 

See Korn for an analogous expression (Reference 7) . 

A. 4 Physical Coefficient Interpretation 

From a physical coefficient interpretation, two 
mathematical concerns must be explored and validated. First, 
one must illustrate that the individual GCS coefficients exist 
for convergence, and second, that the series has the necessary 
properties to become a probability density function (pdf) . 
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In theory, the existence of the series coefficients follow 
if the first two derivatives of $(z) are finite and continuous 
in the infinite interval and both of these derivatives vanish 
at z=±». In more precise mathematical terms, the series 
converges at all points of continuity if §(z) is of bounded 
variation. A most interesting discussion of the convergence 
properties of this general type of series can be found in 
Cramer (Reference 8) . 

To illustrate the second concern, consider the approximating 

GCS : 

*(Z) = i(-l) n * tn) (z) 

n=0 n: 

where 

C n = /^(zJHJzJdz. 

Given these weighted square error minimizing coefficients, the 
existence of these coefficients and the necessary conditions 
for a pdf can be unified. 

Since it is necessary (but not sufficient) to have 

f $(z)dz = 1 

4 -CO 

for *(z) to be a pdf, it follows that 

I(-l) n *(n) (z) 1 dz 
n=0 n • J 

“ I [ fr | ] 

by taking the integral under the sum and then utilizing the 
identity that * (n) (z) = ( -1 ) n * ( z) ( z ) . This integral sum 

interchange is valid if and only if the referenced series is 
convergent. By employing the biorthogonality property between 
* (n) (z) and HjJz) ; 

C *<*>".<*>«* = [?; g? SJ: 2 - 3 - ••• 

Since the GCS is convergent, the necessary condition for $(z) 
to be a pdf imposes the condition that C 0 =l. 

A slightly different type of investigation can be used to 
access the implications of the remaining coefficients. Here 
the statistical terminology of moments will be extremely 
useful. In this terminology, the quantity 

^(b) = E[(z-b) k ] 
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(when it exists) is called the k m moment of the random 
variable z about b. In integral terms, this statistical 
expectation operator becomes: 

E[(z-b) k ] = f (z-b) k f (z)dz, 

where f(z) is a pdf. Whenever b=0, the expectation E[z k ] is 
written as 

n k = m; (0) - E[z k ] = z k f ( z ) dz 

to denote the k m moment about z=0. In addition, the first 
moment of a random variable about its expectation E[(z-»i)] 
is equal to zero. To deduce the implication of this statement, 
utilize the linearity property of the expectation operator 
(i.e., E[ax+by] = aE[x]+bE[y]) to obtain: 

E[(z-n t )] = E[z]-E(Hj] = 0. 

The last step of the above reduction results from the fact that 
the expectation of a constant equals that constant, e.g., 

ECu,] 38 ^. 

In distribution terms provides a measure of the 
distribution centering or expected location. In physical terms 
n, corresponds to the distribution center of gravity. To 
illustrate this point, let the z-axis be thought of as a bar 
with variable density. Under this interpretation the density 
of the bar at any point is given by f(z) so that = E[z] 
is the center of gravity of the bar. Therefore, Hj may be 

thought of as a central value of the distribution. 
Consequently, moments of a distribution about their expected 
value are called central moments. 

Mathematically, the second moment Etfz-iij) 2 ] is called the 

variance of the distribution and is often written as o 2 . 
To interpret the physical implications of this terminology in 
terms of an arbitrary distribution, f(z), 

o\ - E[(z-u,) 2 ] 

- O z "lM 2 f (z)dz 

- j" w [z 2 -2zu,+n 2 ]f (z)dz 

= f" z 2 f (z)dz-2y. f" zf (z)dz+n 2 f°° f(z)dz. 

* —00 —oo * —oo 


Since f" f (z) dz=l for f(z) to be a pdf, the terms 

* -00 

\i, » J* zf(z)dz and n, * f" z 2 f(z)dz 

1 * —00 * * —oo 
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are respectively the first and second moments of f(z). Back 
substitution into the above expection: 


0 ! - »*2 - 2 ^f + ^ = ~ nf. 


As a result, o 2 is a measure of the spread or dispersion of the 
distribution about its center of gravity or expected value. 


Expressing this central moment notation in specific GCS 
coefficient terms by equating \ - J ^(z-n,) 1 ^ (z)dz one obtains the 
original semi- invariant (moment) equations deduced by Thiele: 


N * 0 
*2 * ^2 " 

K 3 - u 3 - 3^ + 2nf 

” 4H 3 H ( " 3 M 2 + 12|* 2 nf " 6^ 



. . . , for N>1 . Here, the notation 

standard binomial coefficient 

f N -M _ (N-l) ! 

I n ) n! (N-n-1) ! ' 

Continuing under the realization that 

z * H, (z) 
z 2 = H 2 (z)+1 
z 3 = H 3 (z)+3z 
Z « = H 4 (z)+6z 2 -3 


where a, 

(V) ■ 


/ a 2 =X 2 , 

denotes the 


for N>1 where 
obtains: 




Z " “ n!2"(N-2n) ! 


,N-2n 


N 


^ denotes the greatest integer function, one 


pi, = f°° z*(z)dz 

I * -CO 

■ J* H,(z)*(z)dz 

-00 

= c, . 
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This identity was acquired by substituting z=H,(z) and then 
using the weighted square error coefficient relationship 
developed in Section A. 3. This same telescoping moment 
generating procedure produces the following moment coefficient 
identities: 


\i 2 = C 2 +l 
1*3 = C 3 +3C, 

= C 4 +6C 2 +3 


*** °n + S n »2 n (N-2n) I Cf, - 2n 

for C 0 =l and N>1. Solving the above identities for the 
coefficients in terms of the various moments, 

c i = H, 

C 2 = M ,* 1 

C 3 = 

C 4 - h 4 -6h 2 +3 


C N »* N +S(-1) ni 2 n (N-2n) !^N-2n 


for jji 0 =1 and N>1. However, these relationships are nothing 

more than the expected value of the referenced order Hermite 
polynomials , i . e . , 


C, = E[H|(z) ] = E[z] = n, 


and 


C,, = E [ H N ( z ) ] 


= E[z«] + | i( -l)" z &W 2n)l E[Z-] 

[|] 

= n!2"(N-2n) for N>1 * 


In summary, it is this observation that makes it 
convenient to generate the GCS coefficients from a data sample 
drawn from a parent measurement population. In practice, this 
"distribution-free" series expansion expresses the required 
distribution in terms of the derivatives of the standardized 
normal distribution (with zero mean and unit variance) , where 
the series coefficients are linear combinations of the random 
data sample moments. 
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To obtain a physical interpretation of these coefficients 
one should note that C,=0 and C 2 =0 as a result of the 
transformation 


from measurement coordinates to z coordinates. Here, 

x = the random measurement sample variate. 

li = the unbiased measurement mean estimate. 

a - the parent measurement population standard deviation 
estimate. 

Knowing that the odd order derivatives of *(z) are odd 

functions and the even order derivatives of 'i'(z) are even 

functions, the effects of the remaining coefficients may be 
interpreted. Excluding the first three coefficients of the 
series, the addition of odd derivative terms (i.e., with their 
respective C 3 , C s , C 7 , . . . coefficients) tends to produce 

asymmetry or skewness relative to the standardized normal 
distribution. On the other hand, the addition of even 

derivative coefficient terms tend to flatten the resulting 
distribution around its mean value. 

Technically, the C 3 and C 4 coefficients have 
statistical names. The C 3 coefficient is called the 
distribution skewness, while C 4 is the statistical excess or 
kurtosis. For example, if a unimodal distribution (i.e., with 
a single maximum) has a longer tail to the right of the central 
maximum than to the left, the distribution is said to be skewed 
to the right or to have positive skewness. If the reverse is 
true, it is said to be skewed to the left or have negative 
skewness. Thus, distributions which are perfectly symmetrical 
about their mean (i.e., such as the standardized normal) have 
zero values for the C 3 coefficient. Consequently, a true 
normal distribution is characterized by only three series 
terms, C 0 =l , C,=0 and C 2 =0 . 
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DISCRETE VERSUS CONTINUOUS FREQUENCY DISTRIBUTIONS 

B.l Discrete Frequency Concerns 

In statistical analysis, attention is often directed toward 
the enumeration of grouped data which have some particular 
attribute or characteristic. For instance, measurements falling 
into a pre-assigned interval are considered to have a common length 
attribute. A table of the relative proportions of measurements in 
each class interval specifies the relative frequency distribution 
or histogram of the inhabitants in each of the discrete class 
intervals. 

The practical difficulties involved in dealing with discrete 
data arrangements or groupings are that the relative proportion 
accuracy of grouped data cannot be relied on to produce smooth 
histogram representations of the true distribution proportions. 
This is partly due to the discrete interval quantization of finite 
measurements and the fact that the class measurement proportions 
are generally considered concentrated at the class interval mid- 
points. In theory, this latter concern can be relieved by utilized 
numerical quadrature (integration) to compensate for lumping the 
entire class interval frequency at its mid-point. 

B. 2 Numerical Quadrature Frequency Adjustment 

As noted above, numerical quadrature can be used to adjust for 
the concentration of histogram frequencies at their respective 
class interval mid-points. To explore the ramifications of this 
adjustment technique, one must first review the continuous 
distribution properties of moments. 

Letting f(x) be a continuous distribution, the integral 

J 00 f (x)dx = 1 
— 00 

characterizes the probability of the distribution f (x) . Con- 
sequently, the contribution of any Xj centered within a finite 

interval of length h in the domain of integration is given by 

f (x)dx. 


In contrast, the discrete element of area or probability attained 
from a finite histogram representation of class interval length h 
and concentrated frequency f located at Xj is hf. 

Extending this elemental notion to distribution moments, 

I* = J“ " x k f (x)dx 

& —CO 
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becomes the k th moment of f(x) in the notation of Appendix A, 
Section A. 4 and its discrete histogram counterpart is 

Mk = 1 x}f r 
J=l 

The basic problem is the development of equivalency approx- 
imations between these elemental forms. 

Under the supposition that f(x) has close contact with the 
x-axis at both extremities of the distribution: 

1) J 00 x k f(x)dx is finite for all positive values of k. 

* -<X> 

2 ) Lim [ x J f (k) (x)] = 0 for all positive integral 

±00 

values of j and k where f tk) (x) denotes the k m derivative 
of f (x) . 

The following relationship can be imposed between these elemental 
forms: 


H k = J* a> x k f (x) dx = w £ xffj. 

Here, w is the constant of proportionality. Utilizing the basic 
Newton-Stirling formula of divided differences: 

f(a+xw) = f (a)+xAf (a)+^x(x-l) A 2 f (a-w) +^(x+l) x(x-l) A 3 f (a-w)+ 
^•(x+1) x(x-l) (x-2) A 4 f (a-2w) + . . . 


and rearranging terms 


f (a+xw) 


f (a)+x[Af (a)-^A 2 f (a-w) ] + 


|yA 2 f(a-w) 


3 f(a . w) . 

Ja 4 f ( a-2w) ] + x2 (^ 2 -l) _ A 4 f ( a _ 2w ) + 


Replacing the differences of even order within the brackets by 
differences of odd order and using the divided difference 
identities, 


A 2 f(a-w) = if (a) -Af (a-w) 
A 4 f(a-2w) = A 3 f (a-w) -A 3 f (a-2w) 
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one obtains 


f (a+xw) = f (a)+x £ Af(a)+ 2 f(a ~ W) ] + ft 
x 2 (x 2 -l) f A 3 f (a-w) +A 3 f (a- 2 w) ] 

3! L 2 J 

Upon substituting Xj for a and n for x 
above expression from -f to i, 

j. 

J ^ f (Xj+nw) dn = f(Xj)+j ^2 ^ 2 f(Xj-w)- 
T 

In particular, when f(x)=e x . 


A 2 f (a-w) + 


+ »V-1) 4 4 f(a . 2w)+ .... 


and then integrating the 


17 


3.51.2 


—4 A 4 f(Xj-2w) + 


A^f (Xj-nw) = 2 2 n e Xj sinh 2n (|) . 


Transforming variables by letting 9=j and dividing throughout 
by e x , the above expression reduces to 


sinh 8 _ 
9 


1 + ztt sinh 2 © - 3 t sinh 4 0 +. . . . 
3 • J • b * 


Using this result in conjunction with the imposed 
proportionality constant in the original elemental probability 
form: 


h 

r x j 2 n 

j f(x)dx = w £ x.f,. 

J h J=1 

X J~2 

With the added approximation that h=w, one can write 


1 

w 



f (x)dx 


1 

f (Xj+nw) dn 

2 



so that 


w 


sinh 8 

8 


f (Xj) . 


Substituting this relationship into the histogram moment form 


N N 

£ Xj- k f j = £ xjf (Xj+kw) , 
j=i j=i 


where the terms on both sides of this equality are the same but 
counted differently. Therefore, 



N 

£ 


j=-i 


- sinh 8 
r J 8 
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At this point, it is convenient to introduce the functional 
E operator notation (see Reference 9) originally used by the 
English mathematician, W. P. Sheppard, to express functional 
operations in ordinary arithmetic terms. It is hypothesized that 
this correspondence resulted in his name being identified with 
this discrete moment correction procedure in the statistical 
literature. 

For the current application, the E operator takes advantage 
of a number of functional laws of correspondence that are 
interrelated with the central difference operator A. To see 
this, note that 


H - £ xff (x.+kw) - 5 x} [ A 2 E-‘ 1 " f 

J=l J*1 j L 1 i 


for some arbitrarily large integer n. To evaluate A 2 E~ l , let 

dw dw 

E=e d * and A=e dx -1 so that 


1 

T 


A 2 E _1 


- sinh 2 [ ] • 


Under this variable substitution, 0=iwD where D denotes the 

j ^ 

standard differential operator ^ and 


Mk = w 



w 2 P 2 wV w 6 D 6 

3 ! 2 2 5 ! 2 4 7 ! 2 6 


] 



^ , ^4 k(k-l) (k-2) (k-3) xj + . . . j . 

Substituting the relationship 

ir N 

— = y j^f 

w jT, *J r J 

into the above expansion 

Mk = n k + jfp k(k-l) |i k 2 + k(k-l) (k- 2 ) (k-3)n k _<+ 

Taking k=l, 2, 3, ... N in succession 


M i = i*j 

M 2 = + T2 

M 3 = ^3 + 4 A 
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"< - M, + 5 w 2 ^ + M «\ 

M s - M s + I + & wV, 


Solving these equations respectively for the ^'s in terms of the 
M/s and noting that w=h along with the condition that 

00 N 

H = J f(x)dx = £ f, = WL = 1 by the relative frequency 

U -00 j_j J u 

assumption: 

I'd = ”o 
», = M, 

*2 - M 2 - T5 «o 

»3 “ M 3 - T M, 

I 1 . = “ T **2 + 540 **o 

"5 - «s - m 3 + TT M l 
* 

K» - l(ti)(2 1 ‘"-l)B„M N . n h" 

where the B n coefficients are Bernoulli numbers. 
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